In machine learning, a confusion matrix, also known as error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. In unsupervised learning it is usually called a matching matrix. The term is used specifically in the problem of statistical classification.
Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa – both variants are found in the literature. The diagonal of the matrix therefore represents all instances that are correctly predicted. The name stems from the fact that it makes it easy to identify whether the system is confusing two classes (i.e., commonly mislabeling one class as another). The confusion matrix has its origins in human perceptual studies of auditory stimuli. It was adapted for machine learning studies and used by Frank Rosenblatt, among other early researchers, to compare human and machine classifications of visual (and later auditory) stimuli.
It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table).
We can then perform the comparison between actual and predicted classifications and add this information to the table, making correct results appear in green so they are more easily identifiable.
| Predicted condition | |||
| Total population = P + N | Positive (PP) | Negative (PN) | |
| rowspan="2" | Positive (P) | True positive (TP) | False negative (FN) |
| Negative (N) | False positive (FP) | True negative (TN) | |
| Sources: (2026). 9781449361327, O'Reilly. ISBN 9781449361327 | |||
The color convention of the three data tables above were picked to match this confusion matrix, in order to easily differentiate the data.
Now, we can simply total up each type of result, substitute into the template, and create a confusion matrix that will concisely summarize the results of testing the classifier:
| Predicted condition | |||
| Total 8 + 4 = 12 | Cancer 7 | Non-cancer 5 | |
| rowspan="2" | Cancer 8 | 6 | 2 |
| Non-cancer 4 | 1 | 3 | |
In this confusion matrix, of the 8 samples with cancer, the system judged that 2 were cancer-free, and of the 4 samples without cancer, it predicted that 1 did have cancer. All correct predictions are located in the diagonal of the table (highlighted in green), so it is easy to visually inspect the table for prediction errors, as values outside the diagonal will represent them. By summing up the 2 rows of the confusion matrix, one can also deduce the total number of positive (P) and negative (N) samples in the original dataset, i.e. and .
For example, if there were 95 cancer samples and only 5 non-cancer samples in the data, a particular classifier might classify all the observations as having cancer. The overall accuracy would be 95%, but in more detail the classifier would have a 100% recognition rate (sensitivity) for the cancer class but a 0% recognition rate for the non-cancer class. F1 score is even more unreliable in such cases, and here would yield over 97.4%, whereas informedness removes such bias and yields 0 as the probability of an informed decision for any form of guessing (here always guessing cancer).
According to Davide Chicco and Giuseppe Jurman, the most informative metric to evaluate a confusion matrix is the Matthews correlation coefficient (MCC).
Other metrics can be included in a confusion matrix, each of them having their significance and use.
Some researchers have argued that the confusion matrix, and the metrics derived from it, do not truly reflect a model's knowledge. In particular, the confusion matrix cannot show whether correct predictions were reached through sound reasoning or merely by chance (a problem known in philosophy as epistemic luck). It also does not capture situations where the facts used to make a prediction later change or turn out to be wrong (defeasibility). This means that while the confusion matrix is a useful tool for measuring classification performance, it may give an incomplete picture of a model’s true reliability.
One such extension is the Transport-based Confusion Matrix (TCM), which builds on the theory of optimal transport and the principle of maximum entropy. TCM applies to single-label, multi-label, and soft-label settings. It retains the familiar structure of the standard confusion matrix: a square matrix sized by the number of classes, with diagonal entries indicating correct predictions and off-diagonal entries indicating confusion. In the single-label case, TCM is identical to the standard confusion matrix.
TCM follows the same reasoning as the standard confusion matrix: if class A is overestimated (its predicted value is greater than its label value) and class B is underestimated (its predicted value is less than its label value), A is considered confused with B, and the entry (B, A) is increased. If a class is both predicted and present, it is correctly identified, and the diagonal entry (A, A) increases. Optimal transport and maximum entropy are used to determine the extent to which these entries are updated.
TCM enables clearer comparison between predictions and labels in complex classification tasks, while maintaining a consistent matrix format across settings.
|
|